In this paper, we present a robust method for scene recognition, whichleverages Convolutional Neural Networks (CNNs) features and Sparse Codingsetting by creating a new representation of indoor scenes. Although CNNs highlybenefited the fields of computer vision and pattern recognition, convolutionallayers adjust weights on a global-approach, which might lead to losingimportant local details such as objects and small structures. Our proposedscene representation relies on both: global features that mostly refers toenvironment's structure, and local features that are sparsely combined tocapture characteristics of common objects of a given scene. This newrepresentation is based on fragments of the scene and leverages featuresextracted by CNNs. The experimental evaluation shows that the resultingrepresentation outperforms previous scene recognition methods on Scene15 andMIT67 datasets, and performs competitively on SUN397, while being highly robustto perturbations in the input image such as noise and occlusion.
展开▼